TextGrid : Optical Character Recognition (OCR)

The Optical Character Recognition (OCR) workflows allow you to translate scanned images of printed "Fraktur" (blackletter) and "Antiqua" texts into machine-encoded text. The results generated will enable you to edit the text in TextGrid, browse it, or to do text mining. The OCR service of TextGrid is based on ocropus. The service is optimized for special types of fonts and some training maybe needed. For more information, please see

http://code.google.com/p/ocropus/

If there is no Project, a new one must be created. After refreshing the Navigator, the created Project will be displayed there and must be selected in order to import images to it. Next, open the "File" menu and select "Import local files" to import the images there. After refreshing the Navigator again, it will display the imported images. Open the workflow. On the right side, the views "Input Document for Workflow", "Workflow Results", "Workflow Selection" and "Job Management" will appear, which will be relevant during the next steps.

First a new workflow is needed. It can be created in the Workflow Section. OCRopus Fractur or Modern can be chosen as services and afterwards the default values are kept. In the last window the workflow can be named and assigned to a specific Project.

After refreshing the Workflow Section, the newly created workflow becomes visible. After double clicking it, an empty list is displayed in the "Input Document for Workflow" View. The images to execute can simply be dragged there from the Project on the left.

Now, select the workflow in the Workflow Selection View and assign a "Target project". Clicking "Run" will now start the image recognition and while it is running, the status will be displayed in the Job Management View.

After finishing, the result will be displayed in the Workflow Results View and can be opened by right-clicking "Open with > Text editor".

Optical Character Recognition